Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

NUM8
BOOL1

Reproduction

Analysis started2020-05-12 20:52:04.494122
Analysis finished2020-05-12 20:52:19.975763
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
#Gravidezes has 111 (14.5%) zeros Zeros
PD has 35 (4.6%) zeros Zeros
DobraTricepes has 227 (29.6%) zeros Zeros
Insulina has 374 (48.7%) zeros Zeros
IMC has 11 (1.4%) zeros Zeros

Variables

#Gravidezes
Real number (ℝ≥0)

ZEROS
Distinct count17
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.845052083
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2.771620009
Skewness0.9016739792
Sum2953
Variance11.35405632
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 5.5 8.5 10.5 13.5 17. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 135 17.6%
 
0 111 14.5%
 
2 103 13.4%
 
3 75 9.8%
 
4 68 8.9%
 
5 57 7.4%
 
6 50 6.5%
 
7 45 5.9%
 
8 38 4.9%
 
9 28 3.6%
 
Other values (7) 58 7.6%
 
ValueCountFrequency (%) 
0 111 14.5%
 
1 135 17.6%
 
2 103 13.4%
 
3 75 9.8%
 
4 68 8.9%
 
ValueCountFrequency (%) 
17 1 0.1%
 
15 1 0.1%
 
14 2 0.3%
 
13 10 1.3%
 
12 9 1.2%
 

Glicose
Real number (ℝ≥0)

Distinct count136
Unique (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.8945312
Minimum0
Maximum199
Zeros5
Zeros (%)0.7%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3140.25
95-th percentile181
Maximum199
Range199
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation31.9726182
Coefficient of variation (CV)0.2644670347
Kurtosis0.6407798204
Mean120.8945312
Median Absolute Deviation (MAD)25.18179321
Skewness0.1737535018
Sum92847
Variance1022.248314
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 67.5 79.5 98.5 129.5 147.5 199. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
100 17 2.2%
 
99 17 2.2%
 
129 14 1.8%
 
125 14 1.8%
 
111 14 1.8%
 
106 14 1.8%
 
95 13 1.7%
 
108 13 1.7%
 
105 13 1.7%
 
102 13 1.7%
 
Other values (126) 626 81.5%
 
ValueCountFrequency (%) 
0 5 0.7%
 
44 1 0.1%
 
56 1 0.1%
 
57 2 0.3%
 
61 1 0.1%
 
ValueCountFrequency (%) 
199 1 0.1%
 
198 1 0.1%
 
197 4 0.5%
 
196 3 0.4%
 
195 2 0.3%
 

PD
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.10546875
Minimum0
Maximum122
Zeros35
Zeros (%)4.6%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile38.7
Q162
median72
Q380
95-th percentile90
Maximum122
Range122
Interquartile range (IQR)18

Descriptive statistics

Standard deviation19.35580717
Coefficient of variation (CV)0.2800908166
Kurtosis5.18015656
Mean69.10546875
Median Absolute Deviation (MAD)12.63942464
Skewness-1.843607983
Sum53073
Variance374.6472712
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 12. 42. 49. 59. ... 61.5 81. 91. 97. 122. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
70 57 7.4%
 
74 52 6.8%
 
68 45 5.9%
 
78 45 5.9%
 
72 44 5.7%
 
64 43 5.6%
 
80 40 5.2%
 
76 39 5.1%
 
60 37 4.8%
 
0 35 4.6%
 
Other values (37) 331 43.1%
 
ValueCountFrequency (%) 
0 35 4.6%
 
24 1 0.1%
 
30 2 0.3%
 
38 1 0.1%
 
40 1 0.1%
 
ValueCountFrequency (%) 
122 1 0.1%
 
114 1 0.1%
 
110 3 0.4%
 
108 2 0.3%
 
106 3 0.4%
 

DobraTricepes
Real number (ℝ≥0)

ZEROS
Distinct count51
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.53645833
Minimum0
Maximum99
Zeros227
Zeros (%)29.6%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q332
95-th percentile44
Maximum99
Range99
Interquartile range (IQR)32

Descriptive statistics

Standard deviation15.95221757
Coefficient of variation (CV)0.776775494
Kurtosis-0.5200718662
Mean20.53645833
Median Absolute Deviation (MAD)13.65962728
Skewness0.1093724965
Sum15772
Variance254.4732453
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 3.5 9. 16.5 26.5 33.5 42.5 50.5 61.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 227 29.6%
 
32 31 4.0%
 
30 27 3.5%
 
27 23 3.0%
 
23 22 2.9%
 
33 20 2.6%
 
18 20 2.6%
 
28 20 2.6%
 
31 19 2.5%
 
39 18 2.3%
 
Other values (41) 341 44.4%
 
ValueCountFrequency (%) 
0 227 29.6%
 
7 2 0.3%
 
8 2 0.3%
 
10 5 0.7%
 
11 6 0.8%
 
ValueCountFrequency (%) 
99 1 0.1%
 
63 1 0.1%
 
60 1 0.1%
 
56 1 0.1%
 
54 2 0.3%
 

Insulina
Real number (ℝ≥0)

ZEROS
Distinct count186
Unique (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.79947917
Minimum0
Maximum846
Zeros374
Zeros (%)48.7%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.2440024
Coefficient of variation (CV)1.444169856
Kurtosis7.214259554
Mean79.79947917
Median Absolute Deviation (MAD)84.50507948
Skewness2.272250858
Sum61286
Variance13281.18008
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 7. 34. 141. 212.5 332.5 562. 846. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 374 48.7%
 
105 11 1.4%
 
140 9 1.2%
 
130 9 1.2%
 
120 8 1.0%
 
100 7 0.9%
 
94 7 0.9%
 
180 7 0.9%
 
110 6 0.8%
 
115 6 0.8%
 
Other values (176) 324 42.2%
 
ValueCountFrequency (%) 
0 374 48.7%
 
14 1 0.1%
 
15 1 0.1%
 
16 1 0.1%
 
18 2 0.3%
 
ValueCountFrequency (%) 
846 1 0.1%
 
744 1 0.1%
 
680 1 0.1%
 
600 1 0.1%
 
579 1 0.1%
 

IMC
Real number (ℝ≥0)

ZEROS
Distinct count248
Unique (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.99257812
Minimum0
Maximum67.1
Zeros11
Zeros (%)1.4%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile21.8
Q127.3
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range67.1
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation7.88416032
Coefficient of variation (CV)0.2464371671
Kurtosis3.290442901
Mean31.99257812
Median Absolute Deviation (MAD)5.842269897
Skewness-0.4289815885
Sum24570.3
Variance62.15998396
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 19.2 21.75 24.15 39.55 46.4 53.05 67.1 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
32 13 1.7%
 
31.6 12 1.6%
 
31.2 12 1.6%
 
0 11 1.4%
 
33.3 10 1.3%
 
32.4 10 1.3%
 
32.8 9 1.2%
 
30.8 9 1.2%
 
32.9 9 1.2%
 
30.1 9 1.2%
 
Other values (238) 664 86.5%
 
ValueCountFrequency (%) 
0 11 1.4%
 
18.2 3 0.4%
 
18.4 1 0.1%
 
19.1 1 0.1%
 
19.3 1 0.1%
 
ValueCountFrequency (%) 
67.1 1 0.1%
 
59.4 1 0.1%
 
57.3 1 0.1%
 
55 1 0.1%
 
53.2 1 0.1%
 

DiabetesPedigreeFunction
Real number (ℝ≥0)

Distinct count517
Unique (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763021
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.24730857
Skewness1.919911066
Sum362.401
Variance0.1097786379
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.078 0.1245 0.2535 0.2705 0.456 0.7465 0.969 1.4685 2.42 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.254 6 0.8%
 
0.258 6 0.8%
 
0.259 5 0.7%
 
0.238 5 0.7%
 
0.207 5 0.7%
 
0.268 5 0.7%
 
0.261 5 0.7%
 
0.167 4 0.5%
 
0.19 4 0.5%
 
0.27 4 0.5%
 
Other values (507) 719 93.6%
 
ValueCountFrequency (%) 
0.078 1 0.1%
 
0.084 1 0.1%
 
0.085 2 0.3%
 
0.088 2 0.3%
 
0.089 1 0.1%
 
ValueCountFrequency (%) 
2.42 1 0.1%
 
2.329 1 0.1%
 
2.288 1 0.1%
 
2.137 1 0.1%
 
1.893 1 0.1%
 

Idade
Real number (ℝ≥0)

Distinct count52
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24088542
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)9.586405436
Skewness1.129596701
Sum25529
Variance138.3030459
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 29.5 46.5 58.5 69.5 81. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
22 72 9.4%
 
21 63 8.2%
 
25 48 6.2%
 
24 46 6.0%
 
23 38 4.9%
 
28 35 4.6%
 
26 33 4.3%
 
27 32 4.2%
 
29 29 3.8%
 
31 24 3.1%
 
Other values (42) 348 45.3%
 
ValueCountFrequency (%) 
21 63 8.2%
 
22 72 9.4%
 
23 38 4.9%
 
24 46 6.0%
 
25 48 6.2%
 
ValueCountFrequency (%) 
81 1 0.1%
 
72 1 0.1%
 
70 1 0.1%
 
69 2 0.3%
 
68 1 0.1%
 

Classe
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500
1
268
ValueCountFrequency (%) 
0 500 65.1%
 
1 268 34.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

#GravidezesGlicosePDDobraTricepesInsulinaIMCDiabetesPedigreeFunctionIdadeClasse
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
55116740025.60.201300
637850328831.00.248261
71011500035.30.134290
82197704554330.50.158531
9812596000.00.232541

Last rows

#GravidezesGlicosePDDobraTricepesInsulinaIMCDiabetesPedigreeFunctionIdadeClasse
7581106760037.50.197260
7596190920035.50.278661
76028858261628.40.766220
76191707431044.00.403431
762989620022.50.142330
76310101764818032.90.171630
76421227027036.80.340270
7655121722311226.20.245300
7661126600030.10.349471
7671937031030.40.315230